Dyr og Data

Data visualisation — introduction

Gavin Simpson

Aarhus University

Mona Larsen

Aarhus University

2024-09-07

Learning objectives

At the end of this topic you should be able to

  • Understand the basic principles of data visualization

  • Understand how human brains perceive colour

  • Know which visual attributes or channels are most effective

  • Know how to represent different kinds of data visually

  • Basic understanding of how to produce visualizations using the ggplot2 package

Readings

Data Visualization

Some data visualizations are better than others

Data Visualization

Some data visualizations are better than others

  • Taste
    • The eye is in the beholder
  • Choice of data
  • Human visual perception

What makes bad figures?

Aesthetic

Tacky, tasteless, ugly, hodgepodge, inconsistent design

Substantive

Graph has problems because of the data being presented

Perceptual

Inspite of good taste and good data, a graph may be confusing or misleading because of how people perceive and process what they are looking at

Always plot your data

  • Anscombe’s quartet
  • Scatterplot
    • 2 quantities mapped to x and y axis
  • \(x\) and \(y\) in each set have the same
    • mean
    • variance
  • Regression lines have same \(\hat{\beta}\)
  • \(x\) and \(y\) have the same correlation

Visualising data

Visualising data

Visualising data

Bad data

Bad data

Bad Data

Perception

Perception

Mapping

Drawing a graph involves mapping data to visual attributes — some more effective than others

Accuracy of Mappings

Colour models

  • Red Green Blue — RGB
    • Hexadecimal (base 16)
    • ‘0’–‘9’ with ‘A’–‘F’ for 10-15
    • Encode 256 values of each colour
    • In R "#AA6633"
    • TVs, Digital cameras, etc
  • Cyan Yellow Magenta Black — CYMK
    • Covers a much wider range of colours than RGB
    • Magazines, printing
  • Hue Chroma Luminance — HCL
    • Hue — colour
    • Chroma — how much of the colour
    • Luminance — brightness
    • Designed to reflect human colour perception
  • Hue Saturation Lightness — HSL

Colour palettes

A colour palette is a the colour scheme or selection used to represent data or design on a graph

Want more than a numerical mapping — want perceptually uniform mappings

  • discrete

  • sequential

  • diverging

Discrete palettes

  • Categorical data

  • Easily distinguishable

  • Favour no one colour

  • Vary H, constant C & L

Sequential palettes

  • Continuous data

  • Brightness & intensity of colour vary

  • Vary C & L, constant H

Sequential multi-hue palettes

  • Continuous data

  • Can vary everything if careful

  • Vary H, C & L

Diverging palettes

  • Continuous data where mid-point means something (0)

  • Single hue in each arm

  • C & L are balanced in each arm

  • C goes to 0 at mid-point

Rainbow

#endrainbow

  • Luminance is not linear or even monotonic

  • Colour vision deficiency

#endrainbow

#endrainbow

Colour vision deficiency

Decreased ability to see colour or differences in colour

  • Red-Green CVD is sex-linked
  • Gene carried on X chromosome
  • Blue-Yellow CVD is not; chromosome 7
  • -anomaly vs -anopia

Pop

Can you see the green squares?

Preattentive pop-out

Some shapes, colours, angles more easy to spot

Can happen before (or almost before) before consciously looking at something

Pop — find the blue circle

Bad graphs

Bad graphs

  • several stand-your-ground obama-graduates https://x.com/MathCurmudgeon/status/1828175452080320967

  • happiest countries https://x.com/TheEditorDiary/status/1742939764003893567

  • man vs animal https://x.com/geospacedman/status/1828708146786816476

  • renzo pie chart https://x.com/witconomist/status/1828570268635451491